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ABSTRACT 


This  paper  discusses  likelihood  ratio  derivative  estima¬ 
tion  techniques  for  stochastic  systems.  After  a  brief  review 
of  the  basic  concepts,  likelihood  ratio  derivative  estimators 
axe  presented  for  the  following  classes  of  stochastic  processes: 
time  homogeneous  discrete-time  Markov  chains,  non-time  ho¬ 
mogeneous  discrete-time  Markov  chains,  time  Homogeneous 
continuous-time  Markov  chains,  semi-Markov  processes,  non¬ 
time  homogeneous  continuous-time  Markov  chains,  and  gen¬ 
eralized  semi-Marsov  processes. 


1.  INTRODUCTION 


In  recent  years,  an  extensive  literature  ha*  begun  to 
develop  within  the  simulation  community  on  efficient  esti¬ 
mation  of  derivatives  of  performance  measures  with  respect 
to  decision  parameters.  In  this  paper,  we  shall  focus  on  de¬ 
scribing  the  basic  ideas  that  underlie  a  recently  introduced 
derivative  estimation  method  known  as  likelihood  ratio 
derivative  estimation  (also  known  as  the  efficient  score 
method).  This  technique  has  been  previously  described  in 
GLYNN  (1986,  1987),  REIMAN  and  WEISS  (1986),  and  RU¬ 
BINSTEIN  (1986). 

In  Section  2,  we  describe  the  basic  likelihood  ratio 
derivative  estimator  in  a  general  setting  in  which  the  essential 
idea  is  most  transparent.  Section  3  specialises  the  estimator 
to  discrete-time  stochastic  processes.  We  derive  likelihood 
ratio  derivative  estimators  for  both  time  homogeneous  and 
non-time  homogeneous  discrete-time  Markov  chains.  In  Sec¬ 
tion  4.  we  conclude  the  paper  with  a  discussion  of  likelihood 
ratio  derivative  estimation  in  continuous  time.  We  present, 
as  examples  of  our  analysis,  the  derivative  estimators  for: 
time  homogeneous  continuous-time  Markov  chains,  non-time 
homogeneous  continuous- time  Markov  chains,  semi-Markov 
processes,  and  generalized  semi-Markov  processes.  In  all  our 
examples,  we  require  that  the  performance  measure  corre¬ 
spond  to  s  terminating  simulation. 

A*  mentioned  earlier,  the  likelihood  ratio  derivative  es¬ 
timation  technique  has  been  previously  investigated  in  a 
number  of  different  papers.  Our  main  contribution  here  is 


to  specialize  the  general  idea  underlying  this  family  of  deriva¬ 
tive  estimators  to  the  various  classes  of  stochastic  processes 
described  above. 


3.  LIKELIHOOD  RATIO  DERIVATIVE 
ESTIMATION 

In  this  aection,  we  provide  a  brief  introduction  to  the 
basic  idea*  that  underlie  likelihood  ratio  derivative  estima¬ 
tion.  To  set  the  stage,  consider  a  family  of  stochastic  sys¬ 
tems  that  is  indexed  by  a  scalar  decision  parameter  9 .  For 
example  in  a  queueing  context.  0  might  correspond  to  the 
service  rate  at  a  particular  station.  Given  the  sample  space 
f2,  let  X(9,  Lj)  be  the  sample  performance  measure  observed 
at  sample  outcome  w  and  decision  parameter  9;  we  permit 
X(9 ,  w)  to  depend  explicitly  on  9  in  order  to  encompass  sit¬ 
uations  in  which  the  “coat”  of  running  the  stochastic  system 
(as  measured  through  X (9) )  depends  on  the  parameter  9. 
(However,  in  many  estimation  settings,  X(9)  is  independent 
of  9  and  therefore  depends  only  on  U) .)  In  addition,  the  prob¬ 
ability  distribution  P/  on  Q  typically  depends  on  9;  Pf  then 
reflects  the  manner  in  which  the  random  environment  is  af¬ 
fected  by  the  decision  parameter.  The  performance  measure 
0t(9)  associated  with  parameter  value  9  is  then  defined  as 
the  expectation 

a(9)  =  !  X{9,u>)Pt(du). 

J  n 

Our  goal  is  to  describe  an  estimation  methodology  for  calcu¬ 
lating  a'(9o)- 

The  likelihood  ratio  method  for  derivative  estimation 
is  based  on  the  following  idea.  Suppoae  that  there  exists  a 
measure  fi  (not  necessarily  a  probability  measure)  such  that 
P$(du)  =  f(9,u)n(du)  i.e.  f(9,  ■)  is  the  density  of  P) 
with  respect  to  fi.  Then, 

a{0)  =  [  X(9,u)f{9^)fi(dw). 

J  n 

Assuming  that  the  derivative  and  integral  can  be  inter¬ 
changed,  we  obtain 
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a'(90)  =  /  X'(90,u)f(9Q^)fi(du) 

n  (2.D 

X(9Q,u)f'(90,u)ft(du). 

We  note  that  the  first  term  on  the  right-hind  side  of  (2.1)  is 
just  Es0X'(9q )  (where  E)(  )  denotes  the  expectation  oper¬ 
ator  associated  with  P«).  Since  this  term  can  be  represented 
as  the  expectation  of  a  r.v.,  standard  Monte  Carlo  methods 
may  be  applied  to  estimate  it.  Specifically,  suppose  that  one 
simulates  i.i.d.  replicates  of  X'(6q)  under  distribution  Pja ; 
the  sample  mean  of  these  observations  then  converges  (at  rate 
n-1/2  in  the  number  n  of  observations)  to  the  first  term. 

To  handle  the  second  term  using  Monte  Carlo  methods, 
we  need  to  represent  it  as  the  expectation  of  a  r.v.  To  ac¬ 
complish  this,  suppose  that  J(u>)  is  a  non-negative  function 
such  that 


see  GLYNN  and  IGLEHART  (1969)  for  further  details.  Un¬ 
fortunately,  the  optimal  sampling  density  g ’  basically  re¬ 
quires  knowledge  of  the  integral  (appearing  in  the  second 
term  in  (2.1))  that  we  are  trying  to  estimate.  Therefore,  the 
choice  of  g'  as  defined  by  (2.5)  is  typically  impractical  to 
implement . 

We  now  describe  a  popular  alternative  to  g“ .  Suppose 
that  the  densities  f(9,u>)  are  such  that  for  9  in  an  open 
neighborhood  of  8q  , 

A(0)  =  {u>  :  f(0,ui)  >  0}  is 

.  (2.6) 

independent  of  9. 

Then,  f(6 o,w)  =  0  implies  that  f (0  ,  U ) )  vanishes  in  a 
neighborhood  of  9,  from  which  it  follows  that  }'(9q,  u)  —  0. 
so  that  f'(9o,Lj)X(9o,  w)  =  0.  Thus,  g(u>)  — 

f(90,ui)n(dv)  satisfies  both  (2.2)  and  (2.3).  In  this  case, 


L 


g(u)n(du)  =  1. 


(2.2) 


H(90,u) 


f'(9  „,w) 
f(6  o.w) 


\ogf(90<uji 


(2.7) 


Then,  the  measure  P(dui)  =  g(u/)fi(du)  is  a  probability 
distribution  on  SI.  If  g  has  the  additional  property  that 

|X((?o,  w)/'(0o,u>)|  >  0  implies 
that  j(w)  >  0, 

then  we  can  represent  the  second  term  as 

f  X(0O ,u)£y&2lg(u)rtdu)  =  EX(8O)H(0 o) 

J  n  9\u) 

(2 .4) 

where  H(9q,u/)  —  f'(9o,u))/g(u)  and  £(■)  denotes  ex¬ 
pectation  relative  to  the  probability  P.  (Note  that  (2.3)  is 
required  to  avoid  dividing  by  sero  m  (2.4).)  Given  the  repre¬ 
sentation  (2.4)  of  the  second  term  as  an  expectation,  we  can 
now  easily  apply  Monte  Carlo  methods  to  estimate  it  (in  the 
same  way  as  for  tbs  first  term). 

We  now  tun*  to  the  question  of  selecting  the  sampling 
density  g.  The  theory  of  importance  sampling  asserts  that 
the  choice  of  g  which  minimises  the  variance  of  the  observa¬ 
tions  of  X(9q)H(0q)  is 


the  right-band  side  of  (2.7)  is  known  as  the  likelihood  ratio 
derivative  (because  H(90,u>)  =  “  the  deriv»- 

tive  of  the  quantity  know  in  the  statistics  literature  as  the 
likelihood  ratio  of  P$  with  respect  to  P»o)- 

This  choice  of  g  has  an  important  advantage.  Note  that 
if  we  sample  outcomes  U >  according  to  / (f?o,  w)/i(dw) ,  we 
can  use  the  c.v.’t  X(0o),  X' (9q) ,  and  X (9q) H (9q)  to  esti¬ 
mate  Oi(9o)  and  both  the  terms  appearing  on  the  right-nand 
side  of  (2.1)  simultaneously.  Thus,  with  this  choice  of  g,  we 
may  estimate  ar(0o)  and  Oi'(9q  )  using  the  original  sampling 
distribution  associated  with  parameter  9o  ■  At  the  same  time, 
it  should  be  noted  that  there  are  important  problem  classes 
(e.g.  rare  event  simulations)  in  which  much  better  choices  of 
g  can  be  made  (better  in  the  sense  of  smaller  variance). 

We  dose  this  section  by  recalling  that  to  derive  (2.1). 
an  interchange  of  the  differentiation  and  expectation  oper¬ 
ators  was  required.  In  virtually  all  practical  examples,  the 
interchange  is  valid  under  mild  additional  regularity  assump¬ 
tions  on  the  problem.  As  a  consequence,  we  shall  ignore  this 
interchange  issue  throughout  the  remainder  of  this  paper. 

3.  LIKELIHOOD  RATIO  DERIVATIVE 
ESTIMATION  IN  DISCRETE  TIME 


9  /nix(0o,w)m,wM^)’ 


(2.5) 


In  this  section,  we  specialize  the  discussion  of  Section  2 
to  the  case  where  X  (9 ,  w)  is  a  sample  performance  measure 
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associated  with  a  discrete-time  sequence  V  —  (Yn  :  n  >  0) 
talcing  values  in  a  discrete  state  space  S-  Specifically,  we 
suppose  that  fl  =  S  X  S  X  .  .  .  and  that  Yn  is  the  coordinate 
rv-  Vn(.j )  =  Wn  for  LJ  —  (uio,  ui\ ,  .  .  .)  6  fi.  We  assume 
that  A  ($)  takes  the  form 

X(9)  =  h(9,Y0,Yu...), 

for  some  real- valued  function  h.  Since  S  is  discrete,  there 
exist  joint  probability  mass  functions  po,  Pi. -  -  -  such  that 


P»{Yo  =  yo . Yn  =  y«}  =  pn(0,  yn)  (3.1) 

where  yn  =  (y0,  .  .  .  ,yn).  Letting 


Pn  ($1  y  n  —  1  >  Vn  ) 

=  P){Yn  =  yn\Yo  =  yo.-  -.Vn-i  -yn-i), 

we  can  write  (3.1)  as  the  product 

p»  { V0  =  yo,  ■  •  • ,  Y„  =  yn  } 

n-  1 

=  po(0,yo)  n  p*(tf>  y*;y*+0 

4=0 

Suppose  now  that  X  (9)  is  a  function  of  Y  up  to  some  finite 
(deterministic)  time  horizon  m,  so  that  X(9)  =  h(0,Ym) 

where  Y m  =  ( Vj.  •  •  •  ,  Ym).  To  apply  the  idea  of  Section  2, 
we  need  to  obtain  a  representation  Pi((Lj)  — 
f  (9 ,ui)p(dw)  for  some  measure  fl.  But  observe  that  for 


n— 1 

P»(<Li)  =  p*(0;wo)  JJpk(^,w*;wt+1)/im(dw) 

4=0 

w  4  =  (wo.  .  -  -  ,u>k)  and  /im  is  counting  measure  on  Om  — 
S  X  S  X  ■  •  •  X  S  (m  +  1  times).  Hence,  we  may  take 


m-l 

/(0,w)  =  p(9,  w0)  p(9,  W4iW4  +  l), 

*=0 


so  that 

m  —  1 

/'(0o,w)  =  p'(0o,  W0)  P4(^0.  ^4  I  W4+1) 

4  =  0 

m-l 

+  P(^0,Wq)5~> P4(0Q.u,4;<*'4  +  l)j"[pj(flo,  + 

4=0  ;*4 

(3.3) 

We  can  simplify  the  above  formula  somewhat.  We  claim  that 
if  p*(0o,u>  t,wt+i)  ^  0,  it  must  follow  that 

Pk(&0,  w4.Wt+i)  >  0.  Forsupposethatpt(0o.u;4  >'4+l) 

=  0.  Then  it  follows  that 


Pk(9o  +  h,  u/tjwt+i)  —  p'i(0o,  k\uk+i)h  +  o{h) 

ash  1  0,  from  which  it  is  evident  that  pt(0o  +  ^.  w  4  ) 

<  0  for  some  h.  But  pk(d,  wk IWfc+i)  is  a  mass  function 
and  hence  must  be  non- negative.  This  contradiction  guaran- 
tees  that  Pk(0Oi  wkiwk+l)  >0.  A  similar  argument  shows 
that  po(9o,u>o)  >  0  whenever  p'0(9o,u/o)  ^  0.  Hence,  we 
may  write  (3.3)  as 

m,w)  = 

/(fl0lW)  p_o(go^o)  + 

[Po(^o.wo)  pt(0o,  w*;w4+i). 

Suppose  that  we  choose  a  g  such  that 
ffim  g{u>)fim(du)  —  1  and  /(®o.w)  >  0  implies  that 
g( U/)  >  0,  then  (2.3)  is  automatically  in  force.  (In  particu¬ 
lar,  setting  g(w)  —  f(9o,u>)  works.)  Hence,  we  find  that 


a'(*o)  =  EtoX'(90)  +  E,X(90)H(90)  (3,4) 

where  Et(  )  denotes  the  expectation  operator  associated 
with  the  probability  Pt(du)  =  g(u)fin(du/),  E$(  )  de¬ 
notes  expectation  relative  to  Pi ,  and 

H{90)  = 

f(9p  i  Vm )  Po(^o,  Vo)  +  y^1  PtC^o.  V4;  V44-1 

9(Ym)  [po(S°-yo)  4=oP4(«o.V*;Vk+1. 

The  same  argument  can  be  extended  to  a  certain  class 
of  random  time  horizons.  In  particular,  suppose  that  T  is 
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a  stopping  time  with  respect  to  Y  i.e.  for  each  m  > 

—  rn)  =  km(Y m)  for  some  function  km .  We 
assume  that  the  performance  measure  X(0)  is  a  function  of 
the  path  of  Y  up  to  the  random  time  horizon  T  i.e.  there 
exists  a  family  of  functions  ho,  hi,..  .  such  that 

co 

*(*)=  rm)/(T  =  m) 

m=0  (3-5) 

=  hj{9,YT)I{T  <  oo). 

As  in  the  derivation  of  (3.4),  we  need  to  represent  Pf  as 
P){cLj)  =  f(9, u)fi(du).  Lctfij-  =  U~_0{w„,  G  Qm  : 

—  1}  and  note  that  for u/  =  (o/o ,  U)\ ,  .  .  .  , Uf  )  G 

n.. 

P,{du)  = 

7-1  -  (3.6) 

Po(^w0)nP*«?,wt;wt+l)Mr(^) 

*  = 0 

where  fix  <»  counting  measure  on  fiy.  Suppose  that  0  is 
chosen  as  a  non-negative  function  on  Qx  having  the  property 
that  Jng(u/)vT(du)  =  1  and  Po(0o, W0)  flljo 

Pk(^0,UJk',^>k+l)  >  0  implies  that  g( w)  >  0  forw  €  fly. 
By  combining  (3  5)  and  (3.6)  and  proceeding  as  in  the  derivw- 
tion  of  (3.4),  we  obtain  the  following  stopping  time  general¬ 
ization  of  (3.4): 


a'(0o)  =  EtoX'(0  0)  +  E,X(Oo)H(0o)  (3.7) 

where 

H(8q)  =  Po^°’  FlIJo  Pk(fio,Yk',  Yk+i) 

9(Yt) 

Po(to  ,Yg)  +  p^flo.yt;  y>.h) 
M»o,Yo)  f^Pi(90,Yk-,Yk+i).  ' 

As  in  the  case  of  (3.4),  one  possible  choice  of  g  is  /(*>).  in 
which  event  (3-7)  simplifies  to: 


a'(0o)  =  E,9[X'(90)  +  X(9o)H(0o)]  (3.8) 

where 


H(0o)  = 

Pojg^Vo)  +  y  p'k(8o’Y k\Yt+i) 

Po(«o,y0)  f^Pk(90,Yk-,Yk+l)' 

\/e  now  give  a  couple  of  examples  to  illustrate  (3.7)  and 

(3.8) . 

(3.9)  EXAMPLE.  Suppose  that  under  distribution  Pg.Y  is 
a  Markov  chain  with  initial  distribution  /i(0)  and  transition 

matrix  P(9).  Assume  that  X(0)  =  hx(Y j)/(T  <  oc) 

(with  T  a  stopping  time),  so  that  a(9)  =  Ef{hx(Y r): 
T  <  oc}.  Then,  (3.8)  yields 

a'(90)  =  Eia{hx(Yx)E(90);T  <  oo},  (3. 10) 

where  H(0o)  =  p'( 90l  Yo)/tl(0o,  Y0)+ 

H*=0  P'(0o,Yk,Yk+i)/P(9o,Yk,Yk^).  In  certain  set¬ 
tings,  the  estimator  suggested  by  (3.10)  may  have  a  large 
variance  (e.g.  rare  event  simulation).  For  such  problems, 
suppose  that  we  select  g  to  satisfy  the  positivity  conditions 
stated  earlier.  Then 


a'(0o)  =  E,{hx(Yx)H(90)\T  <  oo},  (3.n) 
where  H(90)  =  »(90,  y0)Il2jo  **(*0.  Yk  ,  Yk+l ) 

T-l 

myt)  ■  wv  0,  Yo)/P(0o,  Y0)  +  £  m.  n.  n+i) 

1  =  0 

/m>,n,n+l)].  In  *  “rate  event"  setting,  one  would 
typically  choose  £  so  as  to  bias  the  system  to  force  the  oc¬ 
currence  of  more  rare  events. 

(3.12)  EXAMPLE.  In  this  example,  we  assume  that  under 
Pi,  Y  is  a  Markov  chain  with  non- stationary  transition 
probabilities,  so  that  Pi{Yk+l  =  yk+l\Yk  =:£/}  = 

Pk(0,yk,yk+ 1).  Then,  if  a(0)  =  Et{hx{YT),T  <00}, 
(3.8)  yields 


q'(0o)  =  Et0{hT(YT)H(8o)\T  <  00} 

where  H (0f>)  =  p'tfo,  Yo)/fi{90,  Yo)+ 

I37=o  w,n,n+i)/p(«o,n,n+i) ;  n>e  obvious 

analog  of  (3.11)  can  also  be  written  down. 
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4.  LIKELIHOOD  RATIO  DERIVATIVE 
ESTIMATION  IN  CONTINUOUS  TIME 

This  section  it  devoted  to  generalizing  the  idea*  of  Sec¬ 
tion  3  to  continuous- time  discrete-event  dynamical  systems. 
We  view  X(9,  ui)  at  a  sample  performance  measure  asso¬ 
ciated  with  a  continuous-time  process  (Y  =  y(()  :  t  >  0) 
taking  values  in  a  discrete  state  space  S .  The  process  Y  is  as¬ 
sumed  to  be  piece-wise  constant  with  jump  times 
Si,  Si,...  ( Sn  — >  OO  as  n  — »  00).  Hence,  if  So  =0  and 
V’n  =  V(Sn)  ,  we  may  write 

OO 

y(<)  =  £yn/(Sn<*<sn+I). 

n  =0 

Let  An  =  Sr,+ 1  -  Sn  and  put  Zn  =  (Vni  A„).  We 
suppose  that  (1  =  S  X  5  X  ...  where  S  —  S  X  [0,  OO) 
and  that  Zn  is  the  co-ordinate  r.v.  Zn(u)  =  for  W  — 

(ss'o  ,uii ,  .  .  .)  £  Q. 

In  order  to  proceed  in  parallel  with  the  development  of 
Section  3,  we  shall  require  that  the  distributions  P$  on  Q 
have  the  property  that  there  exist  measures  po .  Pi  i  ■  •  •  such 
that 

Pf{Zo  €  dz0)  =  Po(0,  zo)po(^o) 

^{^n+1  €  dzn+l\Z n  =  2n} 

=  Pn(9 ,  tn,  2n+i  )/in(  2  n  ,  dzn+  j  ) 

where  Zn  =  (Zo,  ■  ■  ■  ,  Zn)  and  Zn  =  (^O.  ■  •  •  .  2n)  € 
S  X  ...  X  S  —  n„  ((n  +  I)  times).  Then,  analogously  to 
(3.2),  we  may  write 

P»{Zn  G  dln)  = 

Vrl  -  -  (<■!) 

Po(0,  20)  )Hn(din) 

ksO 

n— 1 

where  fi„(dz„)  =  fio(dzo)  J"[  Pk(  *  k ,dZk+i). 

k  =1 

Suppose  now  that  we  consider  a  performance  measure 
X(6)  that  is  a  function  of  the  path  up  to  horizon  T;  this 
obviously  includes  any  performance  measure  that  depends  on 
y  up  to  time  Sr+l  ■  As  in  Section  3,  we  require  that  T  be 
a  stopping  time  with  respect  to  Z  —  (Zn  '■  n  >  0)  i.e.  for 
each  m  >  0, 1(T  —  m)  —  km(Zm)  far  some  function 
km.  Then,  the  performance  measure  X(9)  may  be  written 
in  the  form 


x(e)  =  J2^,Zk)i(T  =  k) 

k= 0 

=  hT(zT)I(T  <  oc). 

Let  nT  =  U“=0{  z  m  €  Hm  :  km(uim )  =1}  and 
note  that  for  2  t  —  (ro,  .  .  .  ,  Zt)  €  we  may  extend 

(4.1)  to 

P»{Zt  €  dzr)  = 

T- 1 

Po(^.2o)  Pk(9,~z k-,zk+l)^T{dz ,) 

fc= 0 

T-l 

where  pr(dzr)  —  Mo(dzo)  Pk(  2  k,dzk  +  i ).  By  ar- 

k=0 

guing  identically  as  in  Section  3,  we  obtain  the  following 
continuous-time  generalization  of  (3.7).  Suppose  g  is  chosen 
as  a  non-negative  function  on  fix  having  the  property  that 
fnT  g(ZT)Pr(dZT)  —  1  and  Po(0o,  2o)  fit Jo 

Pk(d 0,  2  k,  2*+i)  >  0  implies  that  g(  2  j  )  >  0  for  z  t  £ 
fl T-  Then,  if  Et(-)  is  the  expectation  operator  associated 
with  P(d Z  t)  —  g(  z  x)p(d z  t),  we  obtain  the  derivative 
representation 

a'(0o)  =  EtoX'(90)  +  EtX(90)H(90) 
for  o(9)  —  E${h(0,  Zt)>T  <  Oo},  where 

T-l  _  __ 

H(9o)  =  Po(0o.  Zo)  PJ  Pk{9 o,  Zk',Zk+\)/g(ZT) 

k=0 

P'0(Oq,Zq)  +  y-1  p’k(90,  Zk\ Zk+l) 

.Po(9o,Zo)  t='oPk(6o,Zk-,Zk+i).  ’ 

(4.2) 

As  in  Section  3,  one  possible  choice  for  g  is  g(  Z  j) 
=  Po(0O.2o)n*Jo  P*(0 o.  2*;  2k+i).  in  which  case  P  is 
identical  to  Pft ,  yielding 


<*'(*o)  =  E,0(X'(6Q)  +  X(6o)H(9o)]  (4.3) 


where 


TT/n  \  _  PWo,Zo)  ,  Pt(^0>  Zk\ Zk  +  l) 

-  TTe — T~\  +  - = - ■ 

Po(e0,z0)  f^oPii0o  ,zk-,zk+l) 

We  shall  now  illustrate  these  formulae  with  some  examples. 


a'(0o)  =  £,0(/»(V(s)  :  0  <  s  <  t)H(9 0)] 


where 


(4.4)  EXAMPLE.  Suppose  that  under  Pg  ,Y  is  a  continuous- 
time  Markov  chain  with  initial  distribution  and  gener¬ 

ator  Q(9).  Assume  that  X(9 )  —  /i(Y(s)  :  0  <  S  <  t). 
Then,  X(9)  can  be  represented  as 

X(6)  =  h(Z0,  Z 1,  .  .  .  ,  Zj)  where  T  ia  the  (topping  time 
T  =  inf { I*  >  0  :  £*_0  A*  -  0-  Set  =  (VnJn) 
(recall  that  Zn  G  S  —  S  X  [O.oo)).  Then, 


Pg{Z o  G  dzn}  =  Pa(0,zo)Po(dzo) 

where  po(8.y0,t0)  =  p(9 ,  yo)<?(0 ,  yo)  exp(-q(9 ,  y0)<o). 
<?(0,y)  =  -<2(0,y,y).  «nd  /i0(dzo)  is  the  product  of 
counting  measure  and  Lebesgue  measure.  Furthermore, 

Pg{Zn  +  i  G  dzn+l\Zn  “  2fl}  *“ 

Pn(9,  Z  „■  Zn+i  )Pn(  2  n,  dzn+i  ) 

where  pn(9,Zn;Zn+ 1)  =  Q(Q ,  yn ,  yn+l)?(0, 1/n+l ) 

exp(-9(9,y„  +  i)fn+1)/9(fl,yn)  *"d  pn(7„, dzn+i )  is 
again  the  product  of  counting  measure  and  Lebesque  mea¬ 
sure  .  Formula  (4.3)  now  becomes 


Hie  i  -  v 

1  o;  p(80,yq)  +  /?(o0,n,n+i) 

Y'  /'(^o.  y*.  At) 

*=o  /(fo.tt.A*)- 

(4.6)  EXAMPLE.  In  this  example,  we  show  that  (4.3)  easily 
handles  the  case  where  the  process  is  non-time  homogeneous. 
In  particular,  suppose  that  under  Pg,Y  is  a  non-time  ho¬ 
mogeneous  continuous-time  Markov  chain  with  initial  distri¬ 
bution  fi(9)  and  time-dependent  generator  Q(9 ,  t).  Then. 


Pg{Yn+ 1  —  y,  An+1  G  dtjZn} 


Q(9,Sn+uYn,y ) 
9(9,  ■S’n+i,  Y„) 


q(9,Sn+ 1  +  t,y) 


9(9,  Sn+ 1  +  u,y)du)dt 


where  q(6,  t,  y)  =  —Q(9,t,y,y).  Suppose  that  X(0)  = 
h(Y(s)  ■'  0  <  s  <  f).  If  we  put  T  =  inf {n  >  0 
^t_Q  At  >  f  },  then  (4.3)  takes  the  form 


a'(flo)  =  :  0  <  a  <  t)H(0o)] 


a'(9o)  =  £#.[A(K(«)  :  0  <  5  <  t)H(0 0)] 


where 


where 


ff(*o)  = 


p'(flo,yo)  y  ^(go.n.n-n) 
^o,y0)  ft*  Wo,n.n+i) 


,  Mo,Yt) 
9(«o,YT) 


-  jy(*o,tt)A*. 


*=0 


rue  \  I  y 

n{0o)-  n(0o,Yo)  +  g((o,s*+l,n.y*+i) 


m.Sr+i.rr) 
?(*o,  Sr+ i,Vr) 


q'(0o,t,Yk)dt. 


(4.5)  EXAMPLE.  Suppose  that  under  Pg ,  V  is  a  semi- 
Markov  process  with  initial  distribution  fi(9 ),  jump  matrix 
f?(0),  and  holding  time  distributions  (F(0,  X,  dt)  :  X  G  S). 
Suppose  that  for  each  Z,  /’(fl,  Z,  df  )  =  /(# ,  Z,  f)/i(z,  df ) 
for  some  measure  ji .  Assuming  that  X(9)  —  h(Y  (s)  :  0  < 
s  <  t),  we  again  put  T  —  inf{tl  >  0  :  ^t-0  At  >  f}. 
Formula  (4.3)  becomes 


(4.7)  EXAMPLE.  We  now  suppose  that  Y  is  a  generalized 
semi-Markov  process  (GSMP)  under  Pg\  see  GLYNN  (1983) 
for  further  details  on  GSMP’s.  Let  E  be  the  event  set  of  the 
GSMP.  The  initial  state  of  the  GSMP  is  chosen  according 
to  the  distribution  p(9),  whereas  the  initial  clock  readings 
are  chosen  from  the  distributions  F(9 ,  e ,dt) ,  for  e  G  E 
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When  clock  t  initiates  a  transition  from  state  y,  the  next 
state  is  chosen  from  the  mass  function  p(9,  ;y,e).  Typ¬ 
ically,  when  the  GSMP  enters  a  new  state,  certain  clocks 
need  to  be  stochastically  reset.  We  assume  that  the  distri¬ 
bution  used  to  reset  clock  t'  in  state  y'  when  a  transition 
just  occurred  from  state  y  with  clock  e  as  triggering  event 
is  given  by  F(9 ,  t' ,  y1 ,  e,  y,  dt).  We  require  that  there  exist 
measures  /j(e,  dt),  p(e' ,  y',  e ,  y,  dt)  such  that 


F(9,e',y',e,y,dt) 

=  f(6,e',y',e,y,i),n(e',y',e,y,dt) 


F(9,  e,  dt)  =  f(9,e,t)n(e,dt). 


H(90) 

=  /Ago Jft)  ^ 

^  p(«o.n+i;n.e-(c*)) 

/'(^o.e.Cp,)  ^ 

V  /(flo,e,C0.) 


m,e,n,e-(ct.i),n 


«eN(ys;V'k_,sfc*.1)r 


f(80,e,Yk,e-{Ck.l),Yk 


-iA«) 

-.4)' 


The  above  examples  serve  to  illustrate  the  great  variety 
of  stochastic  processes  to  which  likelihood  ratio  derivative 
estimation  may  be  applied. 


In  a  strict  sense,  the  analysis  of  this  section  does  not  ap¬ 
ply  to  GSMP's.  since  the  appropriate  state  descriptor  for  a 
GSMP  includes  the  value  of  ail  the  clock  readings.  Such  a 
state  descriptor  can  not  typically  be  encoded  as  an  element 
of  S  =  S  X  [0,  oo).  However,  a  close  examination  of  the 
analysis  given  earlier  shows  that  the  essential  feature  was  that 
{Yn  ,  An )  be  representable  as  a  simple  function  of  the  process 
Zn ;  Zn  need  have  no  structure  beyond  (4.1).  In  particular, 
Zn  need  not  be  an  element  of  S  ■  In  the  GSMP  setting,  the 
natural  candidate  for  Z  is  the  tuple  Zn  ~  (Yn,  Cn ),  where 
c„  is  the  vector  that  describe  the  residual  amount  of  time 
left  on  each  of  the  clocks  that  are  active  in  state  Yn .  Clearly, 
A„  is  a  simple  function  of  Zn  (in  a  GSMP  with  unit  speeds, 
An  is  just  the  minimal  element  in  Cn);  furthermore,  under 
(4.8),  the  distribution  P $  for  Zn  can  lx  written  in  the  form 
(4-1). 


Let  N(l/ ;  y,  e)  be  the  set  a!  clocks  active  in  {/  that 
need  to  be  stochastically  re-set  when  a  transition  from  y 
just  occurred  with  event  t  as  the  trigger.  We  further  de¬ 
fine  e*  (c)  to  be  the  index  of  the  triggering  event  associated 
with  clock  vector  C;  we  assume  e*  is  uniquely  defined  for 
each  C.  Suppose  X(S)  ~  h(Y(i)  :  0  <  »  <  1).  If  we  put 
T  =  inf{n  >  0  :  —  O'  *»  easily  verified  that 

(4.3)  takes  the  torn 
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